89 research outputs found
Learning Accurate and Interpretable Decision Rule Sets from Neural Networks
This paper proposes a new paradigm for learning a set of independent logical
rules in disjunctive normal form as an interpretable model for classification.
We consider the problem of learning an interpretable decision rule set as
training a neural network in a specific, yet very simple two-layer
architecture. Each neuron in the first layer directly maps to an interpretable
if-then rule after training, and the output neuron in the second layer directly
maps to a disjunction of the first-layer rules to form the decision rule set.
Our representation of neurons in this first rules layer enables us to encode
both the positive and the negative association of features in a decision rule.
State-of-the-art neural net training approaches can be leveraged for learning
highly accurate classification models. Moreover, we propose a sparsity-based
regularization approach to balance between classification accuracy and the
simplicity of the derived rules. Our experimental results show that our method
can generate more accurate decision rule sets than other state-of-the-art
rule-learning algorithms with better accuracy-simplicity trade-offs. Further,
when compared with uninterpretable black-box machine learning approaches such
as random forests and full-precision deep neural networks, our approach can
easily find interpretable decision rule sets that have comparable predictive
performance.Comment: Published at AAAI 202
GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts
In the context of the rapid development of large language models, we have
meticulously trained and introduced the GujiBERT and GujiGPT language models,
which are foundational models specifically designed for intelligent information
processing of ancient texts. These models have been trained on an extensive
dataset that encompasses both simplified and traditional Chinese characters,
allowing them to effectively handle various natural language processing tasks
related to ancient books, including but not limited to automatic sentence
segmentation, punctuation, word segmentation, part-of-speech tagging, entity
recognition, and automatic translation. Notably, these models have exhibited
exceptional performance across a range of validation tasks using publicly
available datasets. Our research findings highlight the efficacy of employing
self-supervised methods to further train the models using classical text
corpora, thus enhancing their capability to tackle downstream tasks. Moreover,
it is worth emphasizing that the choice of font, the scale of the corpus, and
the initial model selection all exert significant influence over the ultimate
experimental outcomes. To cater to the diverse text processing preferences of
researchers in digital humanities and linguistics, we have developed three
distinct categories comprising a total of nine model variations. We believe
that by sharing these foundational language models specialized in the domain of
ancient texts, we can facilitate the intelligent processing and scholarly
exploration of ancient literary works and, consequently, contribute to the
global dissemination of China's rich and esteemed traditional culture in this
new era.Comment: 22pages,0 figur
Spatial and temporal regeneration patterns within gaps in the primary forests vs. secondary forests of Northeast China
Forest gaps play an important role during forest succession in temperate forest ecosystems. However, the differences in spatial distribution and replacement patterns of woody plants (trees and shrubs) between primary and secondary forests remain unclear during the gap-filling processes, especially for temperate forests in Northeast China. We recorded 45,619 regenerated trees and shrubs in young gaps (<10 years), old gaps (10~20 years), and closed forest stands (i.e., filled gaps) in the primary broadleaved Korean pine (Pinus koraiensis Sieb. Rt Zucc.) forests vs. secondary forests (degraded from primary forests). The gap-filling processes along horizontal (Cartesian coordinate system) and vertical (lower layer: 0~5 m, medium layer: 5~10 m, and upper layer: >10Â m) dimensions were quantified by shade tolerance groups of trees and shrubs. We found that gap age, competition between species, and pre-existing regeneration status resulted in different species replacement patterns within gaps in primary vs. secondary forests. Gap formation in both primary and secondary forests increased species richness, with 33, 38, 39, and 41 in the primary closed stands, primary forest gaps, secondary closed stands, and secondary forest gaps, respectively. However, only 35.9% of species in primary forest gaps and 34.1% in secondary forest gaps successfully reached the upper layer. Based on the importance values (IVs) of tree species across different canopy heights, light-demanding trees in the upper layer of the secondary forests were gradually replaced by intermediate and shade-tolerant trees. In the primary forests, Korean pine exhibited intermittent growth patterns at different canopy heights, while it had continuous regeneration along vertical height gradients in the secondary forests. The differences in Korean pine regeneration between the primary and secondary forests existed before gap formation and continued during the gap-filling processes. The interspecific competition among different tree species gradually decreased with increasing vertical height, and compared to the primary forests, the secondary forests showed an earlier occurrence of competition exclusion within gaps. Our findings revealed the species replacement patterns within gaps and provided a further understanding of the competition dynamics among tree species during the gap-filling processes
Reveal a hidden highly toxic substance in biochar to support its effective elimination strategy
With the aim to develop optimized biochar with minimal contaminants, it is important significance to broaden the understanding of biochar. Here, we disclose for the first time, a highly toxic substance (metal cyanide, MCN, such as KCN or NaCN) in biochar. The cyanide ion (CN−) content in biochar can be up to 85,870 mg/kg, which is determined by the inherent metal content and type in the biomass with K and Na increasing and Ca, Mg and Fe decreasing its formation. Density functional theory (DFT) analysis shows that unstable alkali oxygen-containing metal salts such as K2CO3 can induce an N rearrangement reaction to produce for example, KOCN. The strong reducing character of the carbon matrix further converts KOCN to KCN, thus resulting biochar with high risk. However, the stable Mg, Ca and Fe salts in biomass cannot induce an N rearrangement reaction due to their high binding energies. We therefore propose that high valent metal chloride salts such as FeCl3 and MgCl2 could be used to inhibit the production of cyanide via metal interactive reaction. These findings open a new point of view on the potential risk of biochar and provide a mitigation solution for biochar’s sustainable application
- …